Neural network model optimization method based on annealing process for stainless steel ultra-thin strip

ABSTRACT

Disclosed is a neural network model optimization method based on an annealing process for a stainless steel ultra-thin strip, which belongs to the technical field of data analysis. Aiming at the defects of a size effect caused by a difference of the stainless steel ultra-thin strip from a macroscopic size, a poor adaptability of a common stainless steel mechanical property prediction method and the like, and a good capability of a neural network for processing a complex nonlinear problem, the method comprises modeling various factors influencing annealing of the stainless steel ultra-thin strip by utilizing an artificial neural network, thus predicting and controlling a mechanical property and a microscopic structure after annealing.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Patent ApplicationNo. PCT/CN2022/116856 with a filing date of Sep. 2, 2022, designatingthe United States, now pending, and further claims priority to ChinesePatent Application No. 202210220808.5 with a filing date of Mar. 8,2022. The content of the aforementioned applications, including anyintervening amendments thereto, are incorporated herein by reference.

TECHNICAL FIELD

The present invention belongs to the field of data analysistechnologies, and particularly relates to a neural network modeloptimization method based on an annealing process for a stainless steelultra-thin strip.

BACKGROUND OF THE PRESENT INVENTION

Stainless steel refers to steel resistant to weak corrosive media, suchas air, steam and water, and chemical corrosive media, such as acid,alkali and salt. The stainless steel has good corrosion resistance,comprehensive performance and process performance. With the wideapplication of precision ultra-thin stainless steel materials, such as acoil spring, a stamping member, a mobile phone screen mask, a glassesframe, an ear hoop, a mobile phone vibrator and a precision robot, ademand on the comprehensive performance of the stainless steel isincreasingly strict in the market. Therefore, a yield strength, atensile strength, an elongation and a hardness of materials are sure tobe predicted according to accumulated annealing process parametersduring formulation of an annealing process for a stainless steelultra-thin strip. Researchers have found that, with the reduction of asize of a workpiece, a stress-strain relationship, a formability, afriction coefficient and other parameters of metal materials showsignificantly different characteristics from those of a workpiece of amacroscopic size, which is usually called a size effect. When athickness of an ultra-thin strip sample reaches an order of severalmicrons to tens of microns, there will be only one layer of crystalgrains in a thickness direction of the workpiece after annealing, whichis namely a single-layer crystal. In addition, a grain size and a samplethickness of metal materials both affect a dislocation movement and theevolution of texture orientation, so that the yield strength and thetensile strength of materials show different performances from those ofmaterials of the macroscopic size. In a process of studying the sizeeffect of metal materials, it was found that a strength showed twocompletely opposite trends with the reduction of grain size and samplesize, which means that the smaller the grain size and the sample sizeare, the weaker the strength is, and the smaller the grain size and thesample size are, the stronger the strength is. When the sample size andthe grain size are in an order of iim, then the smaller the grain sizeand the sample size are, the weaker the strength is; while when thesample is a single crystal or the sample size is in an order of nm, thenthe smaller the grain size and the sample size are, the stronger thestrength is. At present, related mechanisms are not clear enough in thestudy on the size effect, so that the size effect still needs to befurther studied.

At present, a continuous annealing process is mainly used for annealingsteel in a stainless steel strip, and a process flow is as follows:feeding and receiving procedures→cloth clamping device→looperdevice→front cooling water jacket→Muffle tube annealing furnacesection→rear cooling water jacket→winding and unloading, which issuitable for mass production. Therefore, it is necessary to select anappropriate annealing process according to customer's requirementsbefore production. There are many factors affecting heat treatment, suchas a stainless steel trademark, a chemical composition of stainlesssteel, a thickness of strip steel, a degree of cold deformation, anoriginal grain size, an annealing temperature, a heat preservation time,an annealing atmosphere, a heating speed and a cooling speed, withcomplex influencing relationships. The control of mechanicalperformances after annealing is mainly determined by an empiricalformula, but this method lacks an adaptability to changes of differentinfluencing factors. However, if the study is performed by experiments,it will take a lot of times of experiments, consume a lot of manpower,material resources and financial resources, and cannot adapt tointelligent control.

Artificial intelligence refers to the processing and utilization ofinformation by simulating some intelligent mechanisms of human beings,some natural phenomena or intelligent behaviors of organisms. This kindof algorithm is intuitive and rich in natural mechanisms whenconstructed. In the field of artificial intelligence, there are manyproblems for which optimal solutions or quasi-optimal solutions need tobe found in a complex and huge search space. An intelligent optimizationalgorithm is an algorithm produced in this background and proved to beparticularly effective by practice. Traditional intelligent optimizationalgorithms comprise an evolutionary algorithm, a particle swarmoptimization, a tabu algorithm, simulated annealing, an ant colonyalgorithm, a genetic algorithm, an artificial neural network technologyand the like. These algorithms are all widely applied in banking,machinery, mining, social science and other industries and disciplines.

By simulating a brain of human beings, a neural network is formed byconnecting multiple neurons, which can flexibly deal with complexnonlinear problems among input, storage and output. The neural networkis characterized by a strong adaptive learning capability, accurateprediction and good robustness, and can better realize informationprediction and control. An excellent nonlinear approximation performanceof the neural network makes the neural network perform well in manyfields, such as pattern classification, clustering, regression andfitting, and optimization calculation. In recent years, the neuralnetwork has been applied to solve an optimization problem of nonlinearprocess parameters during steel rolling and annealing.

SUMMARY OF PRESENT INVENTION

Aiming at an optimization problem of nonlinear process parameters in anannealing process of a stainless steel ultra-thin strip, the presentinvention provides a neural network model optimization method based onan annealing process for a stainless steel ultra-thin strip.

The present invention is intended to model annealing process parametersby an artificial neural network technology—an important component ofartificial intelligence aiming at a nonlinearity and a complexity ofannealing of the stainless steel ultra-thin strip, which has a strongadaptive learning capability, accurate prediction and good robustness,and can better realize information prediction and control.

In order to achieve the above objective, the following technicalsolution is used in the present invention.

A neural network model optimization method based on an annealing processfor a stainless steel ultra-thin strip is provided, wherein an errorback propagation algorithm is employed to train a single hidden layerneural network, comprising:

-   -   step 1: designing a network model, and determining a number of        layers of the network, and a number of nodes of an input layer,        a number of nodes of an output layer and a number of nodes of a        hidden layer;    -   step 2: selecting a transfer function, a training method and        training parameters;    -   step 3: selecting sample data according to the step 2, dividing        the sample data into a training set and a testing set, and        performing data preprocessing;    -   step 4: setting and initializing parameters of the neural        network;    -   step 5: adjusting forward propagation of a working signal of the        neural network;    -   step 6: adjusting back propagation of an error signal of the        neural network;    -   step 7: calculating an error value matrix and a Jacobian matrix;    -   step 8: updating a weight and a threshold of the neural network;        and    -   step 9: performing error calculation and neural network testing.

The error back propagation algorithm is used for learning; a learningprocess of the neural network is to adjust a weight between neurons anda threshold of each functional neuron according to training data; in theneural network (BP network), the working signal is forwardly propagatedlayer by layer through the hidden layer from the input layer, and whenthe weight and the threshold of the network are trained, the errorsignal is reversely propagated, and a connection weight and a connectionthreshold of the network are forwardly corrected layer by layer througha middle layer from the output layer; and with the deepening of leaning,a final error will be smaller and smaller.

Further, a multi-layer network with one hidden layer is used. Themulti-layer neural network with the single hidden layer is used, whichmakes the network have a better capability to deal with a nonlinearproblem; the multi-layer neural network comprises the input layer, theoutput layer and the hidden layer, all the layers are connected witheach other, and neurons of the same layer are not connected with eachother, wherein neurons of the input layer receive an external input,neurons of the hidden layer and the output layer process a signal, andfinally neurons of the output layer output the signal; and themulti-layer network design enables the network to mine more informationfrom input sample data, thus finishing a more complex task.

Further, the selecting the sample data according to the step 2, dividingthe sample data into the training set and the testing set, andperforming the data preprocessing in the step 3, further comprises thefollowing steps of:

-   -   step 3.1: dividing the sample data into the training set and the        testing set; and    -   step 3.2: normalizing the samples in the training set and the        testing set.

Further, a specific method for the normalizing the samples of thetraining set and the testing set in the step 3.2, comprises: mappingdata to [0, 1] or [−1, 1] by using a mapminmax function, and recordingan input in a data set as x and an output in the data set as o;

-   -   that is: normalizing the samples to [0, 1] by a formula        u_(M)(1)=(x−x_(min))/(x_(max)−x_(min)); and normalizing the        samples to [−1, 1] by a formula        u_(M)(1)=2*(x−x_(min))/(x_(max)−x_(min))−1, wherein x represents        an input, which is generally a sample data value,) and u_(M)(1)        represents an initial input value of the network; and    -   similarly, normalizing the output o to obtain an expected output        d(n) of the network, wherein x_(max) represents a maximum input        value, and x_(min) represents a minimum input value.

Further, a specific method for the setting and initializing theparameters of the neural network in the step 4, comprises: employing athree-layer neural network, setting a transfer function of the hiddenlayer as a Sigmod function, and setting a transfer function of theoutput layer as a linear function; and representing an input and anoutput of each layer with u and v, wherein:

-   -   an input of the input layer is u_(M) ^(m)(n) and an output of        the input layer is v_(M) ^(m)(n);    -   an input of the hidden layer is u_(I) ^(i)(n) and an output of        the hidden layer is v_(I) ^(i)(n);    -   an input of the output layer is u_(J) ^(j)(n) and an output of        the output layer is v_(J) ^(j)(n);    -   a number of neurons of the input layer is M and an m^(th) neuron        of the input layer is recorded as x_(m);    -   a number of neurons of the hidden layer is I and an i^(th)        neuron of the hidden layer is recorded as k_(i);    -   a number of neurons of the output layer is J and a j^(th) neuron        of the output layer is recorded as y_(j);    -   a connection weight from X_(m) to k_(i) is ω_(mi) ¹ and a        connection threshold is b_(i) ¹;    -   a connection weight from k_(i) to y_(j) is ω_(ij) ² and a        connection threshold is b_(j) ²;

an input signal of the network is denoted as u_(M)(n)=[u_(M) ¹,u_(M) ²,. . . ,u_(M) ^(M)]′;

-   -   an actual output of the network is denoted as Y(n)=[v_(J)        ¹,v_(J) ², . . . , v_(J) ^(J)];    -   an expected output of the network is denoted as d(n) [d₁, d₂, .        . . , d_(J)];    -   wherein n represents a number of iterations, and d represents an        output value of the sample data;    -   an error of the j^(th) neuron of the output layer in an n^(th)        iteration is denoted as e_(j)(n)=d_(j)(n)−Y_(j)(n), and a total        error is denoted as

${{E(n)} = {\frac{1}{2}{\sum_{j = 1}^{J}{e_{j}^{2}(n)}}}},$

wherein e represents the error;

-   -   a weight matrix W¹ between the neuron of the input layer and the        neuron of the hidden layer is:

${{W^{1}(n)} = \begin{bmatrix}\omega_{11}^{1} & \omega_{12}^{1} & \cdots & \omega_{1i}^{1} & \cdots & \omega_{1I}^{1} \\\omega_{21}^{1} & \omega_{22}^{1} & \cdots & \omega_{2i}^{1} & \cdots & \omega_{2I}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{m1}^{1} & \omega_{m2}^{1} & \cdots & \omega_{mi}^{1} & \cdots & \omega_{mI}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{M1}^{1} & \omega_{M2}^{1} & \cdots & \omega_{Mi}^{1} & \cdots & \omega_{MI}^{1}\end{bmatrix}};$

-   -   a weight matrix W² between the neuron of the hidden layer and        the neuron of the output layer is:

${{W^{2}(n)} = \begin{bmatrix}\omega_{11}^{2} & \omega_{12}^{2} & \cdots & \omega_{1j}^{2} & \cdots & \omega_{1J}^{2} \\\omega_{21}^{2} & \omega_{22}^{2} & \cdots & \omega_{2j}^{2} & \cdots & \omega_{2J}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{i1}^{2} & \omega_{i2}^{2} & \cdots & \omega_{ij}^{2} & \cdots & \omega_{iJ}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{I1}^{2} & \omega_{I2}^{2} & \cdots & \omega_{Ij}^{2} & \cdots & \omega_{IJ}^{2}\end{bmatrix}};$

-   -   a threshold b¹(n) of the neuron of the hidden layer is:        b¹(n)=[b₁ ¹,b₂ ¹, . . . , b_(i) ¹]′;    -   a threshold b²(n) of the neuron of the output layer is:        b²(n)=[b₁ ²,b₂ ², . . . , b_(j) ²]′;

Further, a specific method for the forward propagation of the workingsignal of the neural network in the step 5, comprises:

-   -   setting the output of the input layer to be equal to an input        signal of the network: v_(M) ^(m)(n)=u_(M) ^(m)(n);    -   setting the input of the i^(th) neuron of the hidden layer to be        equal to a weighted sum of the output of the input layer: u_(I)        ^(i)(n)=Σ_(m=1) ^(M)ω_(mi) ¹(n)v_(M) ^(m)(n)−b_(i) ¹(n); and        setting the output of the i^(th) neuron of the hidden layer to        be equal to the transfer function of the hidden layer: v_(I)        ^(i)(n)=f(u_(I) ^(i)(n)), wherein f(⋅) is the transfer function        of the hidden layer;    -   setting the input of the j^(th) neuron of the output layer to be        equal to a weighted sum of the output of the hidden layer: u_(J)        ^(i)(n)=Σ_(i=1) ^(I)ω_(ij) ²(n)v_(I) ^(i)(n)−b_(j) ²(n); and        setting the output of the j^(th) neuron of the output layer to        be equal to the transfer function of the output layer: v_(J)        ^(i)(n)=g(u_(J) ^(j)(n)), wherein g(⋅) is the transfer function        of the output layer;    -   so, an error of the j^(th) neuron of the output layer is equal        to: e_(j)(n)=d_(j)(n)−v_(J) ^(j)(n); and    -   a total error of the network is denoted as:

${E(n)} = {{\frac{1}{2}{\sum\limits_{j = 1}^{J}{e_{j}^{2}(n)}}} = {\frac{1}{2}{\sum\limits_{j = 1}^{J}\left\{ {{d_{j}(n)} - {g\left\lbrack {{\sum\limits_{i = 1}^{I}{{\omega_{ij}^{2}(n)}{f\left( {{\sum\limits_{m = 1}^{M}{{\omega_{mi}^{2}(n)}{e_{M}^{m}(n)}}} - {b_{i}^{1}(n)}} \right)}}} - {b_{j}^{2}(n)}} \right\rbrack}} \right\}^{2}}}}$

Further, a specific method for the back propagation of the error signalof the neural network in the step 6, comprises:

-   -   step 6.1: in a weight and threshold adjustment stage, reversely        adjusting layer by layer along the neural network, and adjusting        the weight ω_(ij) ² and the threshold b_(j) ² between the hidden        layer and the output layer first;    -   a partial derivative of the total error to the weight ω_(ij) ²        between the hidden layer and the output layer being:

${\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{\omega_{ij}^{2}(n)}}} = {{{e_{j}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot {v_{I}^{i}(n)}} = {{- {e_{j}(n)}}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}{v_{I}^{i}(n)}}}}},$

-   -   a partial derivative of the total error to the threshold b_(j) ²        between the hidden layer and the output layer being:

${\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{b_{j}^{2}(n)}}} = {{{e_{j}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot \left( {- 1} \right)} = {{e_{j}(n)}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}}}}},$

-   -   a local gradient being:

${\delta_{J}^{j} = {{- \frac{\partial{E(n)}}{\partial{u_{J}^{j}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{e_{J}(n)}}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}}} = {{e_{ju}(n)}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}}}}},$

-   -   wherein g′(⋅) represents a derivative of the transfer function        g(⋅) of the output layer; and    -   step 6.2: forwardly propagating the error signal, and adjusting        the weight ω_(mi) ¹ and the threshold b_(i) ¹ between the input        layer and the hidden layer;    -   a partial derivative of the total error to the weight ω_(mi) ¹        between the input layer and the hidden layer being:

$\frac{\partial{E(n)}}{\partial\omega_{mi}^{1}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{u_{I}^{i}(n)}}{\partial\omega_{mi}^{1}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot {{v_{M}^{m}(n)}.}}}}}$

-   -   a partial derivative of the total error to the threshold b_(i) ¹        between the input layer and the hidden layer being:

${\frac{\partial{E(n)}}{\partial{b_{i}^{1}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{b_{i}^{2}(n)}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot \left( {- 1} \right)}}}}},$

-   -   a local gradient being:

${\delta_{I}^{i} = {{- \frac{\partial{E(n)}}{\partial{u_{I}^{i}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)}} = {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)}}}}}}},$

-   -   wherein f′(⋅) represents a derivative of the transfer function        f(⋅) of the input layer; and    -   the local gradient of the neuron is equal to a product of the        error signal of the neuron and the derivative of the transfer        function;    -   so, the weight and the threshold are denoted with the local        gradient as:

${\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{- \delta_{J}^{j}}{v_{I}^{i}(n)}}};$${\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = \delta_{J}^{j}};$${\frac{\partial{E(n)}}{\partial{\omega_{mi}^{2}(n)}} = {{- \delta_{J}^{j}}{v_{M}^{jm}(n)}}};$$\frac{\partial{E(n)}}{\partial{b_{i}^{2}(n)}} = {\delta_{J}^{j}.}$

Further, a specific method for the calculating the error value matrixand the Jacobian matrix in the step 7, comprises:

-   -   denoting an error value matrix of Q samples as:

${{e(n)} = \begin{bmatrix}{e_{11}(n)} & {e_{12}(n)} & \cdots & {e_{1q}(n)} & \cdots & {e_{1Q}(n)} \\{e_{21}(n)} & {e_{22}(n)} & \cdots & {e_{2q}(n)} & \cdots & {e_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{j1}(n)} & {e_{j2}(n)} & \cdots & {e_{jq}(n)} & \cdots & {e_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{J1}(n)} & {e_{J2}(n)} & \cdots & {e_{Jq}(n)} & \cdots & {e_{JQ}(n)}\end{bmatrix}};$

-   -   denoting an element of the Jacobian matrix as:

${{J_{jq}(n)} = \begin{bmatrix}\frac{\partial{e_{jq}(n)}}{\partial{\omega_{11}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{12}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1J}^{2}(n)}} \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{21}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{22}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2J}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{i1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{i2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{ij}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{iJ}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{I1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{I2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{Ij}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{IJ}^{2}(n)}}\end{bmatrix}};$

-   -   denoting a structure of the Jacobian matrix as:

${{J(n)} = \begin{bmatrix}{J_{11}(n)} & {J_{12}(n)} & \cdots & {J_{1q}(n)} & \cdots & {J_{1Q}(n)} \\{J_{21}(n)} & {J_{22}(n)} & \cdots & {J_{2q}(n)} & \cdots & {J_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{j1}(n)} & {J_{j2}(n)} & \cdots & {J_{jq}(n)} & \cdots & {J_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{J1}(n)} & {J_{J2}(n)} & \cdots & {J_{Jq}(n)} & \cdots & {J_{JQ}(n)}\end{bmatrix}};$

-   -   similarly, obtaining the Jacobian matrix of the weights of the        input layer and the hidden layer;

H being a Hessian matrix of an error performance function, whichcontains second derivative information of the error function; when theerror performance function has a form of square sum error, the Hessianmatrix being approximately denoted as H=J^(T)J; and a gradient beingdemoted as g==J^(T)e, wherein J is a Jacobian matrix of a firstderivative of the error performance function to the weight of thenetwork.

Further, a specific method for the updating the weight and the thresholdof the neural network in the step 8, comprises:

-   -   adjustment amount Δω=learning rate η·local gradient δ·output        signal of previous layer 17;    -   due to uncertain reversibility of J^(T)J, a unit matrix U is        introduced to obtain H=J^(T)J+μU, wherein μ is a damping factor;    -   according to a formula ω(n+1)=ω(n)−[J^(T)J+μU]⁻¹J^(T)e, a weight        and a threshold of a LM algorithm are corrected; and when μ=0,        the LM algorithm is degenerated into a Newton Method;    -   a weight update formula is denoted as:

ω_(ij) ²(n+1)=ω_(ij) ²(n)−[J ²(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)V _(I) ^(i)(n) ω_(mi) ¹(n+1)=ω_(mi) ¹(n)−[J ²(n)^(T) J ²(n)+μU]⁻¹ ηJ ¹(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I) ¹(n)v _(M)^(m)(n);

and

-   -   a threshold update formula is denoted as:

b _(j) ²(n+1)=b _(j) ²(n)−[J ₂(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)v _(I) ^(i)(n) b _(i) ¹(n+1)=b _(i) ¹(n)−[J ₁(n)^(T) J ¹(n)^(T) J¹ +μU] ⁻¹ ηJ ₁(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I)¹(n)v _(M) ^(m)(n).

The LM algorithm based on numerical optimization optimizes the neuralnetwork model; the LM algorithm is a most widely applied nonlinear leastsquares algorithm, which is a combination of a gradient descent methodand a Newton method, and has the advantages of the two methods at thesame time; and the LM algorithm is insensitive to parametric problems,and can effectively deal with a redundant parameter problem, thusgreatly reducing a chance of making a performance function fall into alocal minimum. The damping factor is introduced in the LM algorithm;when the damping factor is 0, the LM algorithm is degenerated to theNewton method; and when the damping factor is very large, the LMalgorithm is equivalent to the gradient descent method with a small stepsize.

Further the step 9 comprises: calculating an error value, judgingwhether a MSE error formula meets an accuracy requirement, when the MSEerror formula meets the accuracy requirement, stopping the iteration;when the MSE error formula does not meet the accuracy requirement,continuing the iteration; after finishing training of the neuralnetwork, testing a testing set; and obtaining an actual predicted valueby inversely normalizing an output result of the network.

Compared with the prior art, the present invention has the followingadvantages.

1. According to the present invention, the BP neural network predictionmodel is designed, and the neural network model is optimized from thenumber of neurons of the hidden layer, the training function and otheraspects, thus improving prediction accuracy of mechanical performancesof stainless steel after annealing.

2. Mechanical performances of 316L stainless steel after annealing areevaluated by a comprehensive quantitative evaluation method of heatprocessing quality, optimum process parameters optimized by the BPneural network are compared with currently used annealing processparameters of a certain enterprise, and the optimized process parameterscan significantly improve the mechanical performances of the stainlesssteel.

3. After optimization, the BP neural network has a good predictioncapability and a high prediction accuracy, has a good application effectin a heat treatment production line, and is conductive to obtainingoptimum process parameters of the heat treatment by fewer experiments,thus greatly saving manpower, material resources and financialresources.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a specific work flow chart of the present invention;

FIG. 2 is a schematic diagram of a three-layer BP neural networkconstructed by the present invention;

FIGS. 3A-3F show graphs of a true value, a simulated value and anabsolute error of each group of data of a training set and a testingset; FIG. 3A shows a graph of a true value, a simulated value and anabsolute error of a group of data of a training set for yield strength;

FIG. 3B shows a graph of a true value, a simulated value and an absoluteerror of a group of data of a training set for tensile strength; FIG. 3Cshows a graph of a true value, a simulated value and an absolute errorof a group of data of a training set for elongation after fracture;

FIG. 3D shows a true value, a simulated value and an absolute error of agroup of data of a testing set for yield strength; FIG. 3E shows a truevalue, a simulated value and an absolute error of a group of data of atesting set for tensile strength; and FIG. 3F shows a true value, asimulated value and an absolute error of a group of data of a testingset for elongation after fracture;

FIGS. 4A-4D show regression curve graphs of the training set, thevalidation set and the testing set; FIG. 4A shows a regression curvegraph of the training set; FIG. 4B shows a regression curve graph of thevalidation set; FIG. 4C shows a regression curve graph of the testingset; and FIG. 4D shows a regression curve graph of the training set, thevalidation set and the testing set together; and

FIG. 5 shows a graph of an average relative error of different numbersof neurons of a hidden layer.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Embodiment 1

A neural network model optimization method based on an annealing processfor a stainless steel ultra-thin strip comprises the following steps.

In step 1, a network model is designed, and a number of layers of thenetwork, and a number of nodes of an input layer, a number of nodes of ahidden layer and a number of nodes of an output layer are determined.

An annealing experiment is performed on the stainless steel ultra-thinstrip, independent variables comprise a heat treatment temperature, aheat preservation time and a sampling direction, and dependent variablescomprise a yield strength, a tensile strength, an elongation afterfracture and a hardness.

The number of nodes of the input layer depends on a number of dimensionsof an input vector. The heat treatment temperature, the heatpreservation time and the sampling direction are selected as inputs ofthe neural network, and the number of nodes of the input layer is 3.

The number of nodes of the output layer is determined according to anabstract model, and the yield strength, the tensile strength, theelongation after fracture and the hardness are selected as outputs ofthe neural network, so that the number of nodes of the output layer is3.

The multi-layer neural network may contain one or more hidden layers.The more the hidden layers are provided, the stronger the dataexpression capability is. However, a training cost can be increased andover-fitting is easily caused at the same time.

At present, there is no ideal analytical formula that can be used todetermine a reasonable number of nodes of the hidden layer, which isusually adjusted by trial and error in practice. Generally, thedetermination of the number of nodes has the following two conventions.

1. If a change of the sample function to be approximated is very wide inrange and drastic, the number of nodes of the hidden layer is expectedto be larger.

2. If an accuracy requirement is very high, the number of nodes of thehidden layer should be larger.

Meanwhile, an empirical formula may be used to give an estimated value.

1. Σ_(i=0) ^(n)C_(M) ^(i)>k, wherein k is a number of samples, M is thenumber of nodes of the hidden layer, and n is the number of nodes of theinput layer. If i>M, it is specified that C_(M) ^(i)=0.

2. M=√{square root over (m+n)}+a, wherein m and n are respectively thenumber of nodes of the output layer and the number of nodes of the inputlayer, and a is an integer between [0, 10].

3. M=log₂ n, wherein n is the number of nodes of the input layer.

4. Kolmogorov theorem: a continuous function is given, and the functionmay be accurately realized by a three-layer feedforward neural network.The number of nodes of the input layer and the number of nodes of theoutput layer are respectively set as n and m, so the number of nodes ofthe hidden layer is M=2n+1.

In step 2, a transfer function, a training method and trainingparameters are selected.

For the selection of the transfer function, generally, a Sigmod functionis used in the hidden layer, and a linear function is used in the outputlayer.

For a general curve fitting problem, when a weight of the network isless than 100, an optimum training algorithm for the neural network is aLM algorithm.

Training parameters needed by a BP network comprise an initial weight,an initial threshold, a learning rate, a momentum factor, a maximumnumber of iterations and an error tolerance.

An excessively large or small initial value may affect performances, theinitial weight is usually defined as a small non-zero random number, andan empirical value is between (−2.4/F, 2.4/F or (−3/, 3/), wherein F isa number of neurons connected with a weight input terminal.

A value of the learning rate is between [0, 1], and is 0.01 in theembodiment.

The maximum number of iterations may be 1000 to 10000.

The error tolerance may be 10⁻⁵.

In step 3: sample data are selected, divided into a training set and atesting set, and subjected to data preprocessing.

The sample data are divided into the training set and the testing set.

Samples in the training set and the testing set are normalized.

In order to ensure a training effect, the samples must be normalized,and the data may be mapped to [0, 1] or [−1, 1] through normalization.

The samples may be normalized by a mapminmax function, and an algorithmprinciple is as follows.

1. y=(x−x_(min))/(x_(max)−x_(min)), the samples are normalized to [0,1].

2. y=2*(x−x_(min))/(x_(max)−x_(min))−1, the samples are normalized to[−1, 1].

In step 4: parameters of the neural network are set and initialized.

A three-layer BP network is shown in FIG. 2 , and it is assumed that anumber of neurons of the input layer is M, a number of neurons of thehidden layer is I, and a number of neurons of the output layer is J. Anm^(th) neuron of the input layer is recorded as x_(m), an i^(th) neuronof the hidden layer is recorded as k_(i), and a j^(th) neuron of theoutput layer is recorded as y_(j). A connection weight from x_(m) tok_(i) is ω_(mi) ¹ and a connection threshold is b_(i) ¹; and aconnection weight from k_(i) to y_(j) is ω_(ij) ² and a connectionthreshold is b_(j) ². The transfer function of the hidden layer is theSigmod function, and the transfer function of the output layer is thelinear function. An input and an output of each layer are representedwith u and v, for example, v_(M) ² represents an output of a 2^(rd)neuron of an M layer (which is namely the input layer). An actual outputof the network is Y(n)=[v_(J) ¹,v_(J) ², . . . ,v_(J) ^(J)], and anexpected output of the network is d(n)=[d₁, d₂, . . . ,d_(J)], where nis a number of iterations. An error of an n^(th) iteration is defined ase_(j)(n)=d_(j)(n)−Y_(j)(n) and a total error is

${{E(n)} = {\frac{1}{2}{\sum_{j = 1}^{J}{e_{j}^{2}(n)}}}}.$

An input signal of the network is u_(M)(n)=[u_(M) ¹,u_(M) ², . . .,u_(M) ^(M)]′. u_(M)(1) represents in initial input value of thenetwork.

A weight matrix W¹ between the neuron of the input layer and the neuronof the hidden layer and a weight matrix W² between the neuron of thehidden layer and the neuron of the output layer are respectively asfollows:

${{W^{1}(n)} = \begin{bmatrix}\omega_{11}^{1} & \omega_{12}^{1} & \cdots & \omega_{1i}^{1} & \cdots & \omega_{1I}^{1} \\\omega_{21}^{1} & \omega_{22}^{1} & \cdots & \omega_{2i}^{1} & \cdots & \omega_{2I}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{m1}^{1} & \omega_{m2}^{1} & \cdots & \omega_{mi}^{1} & \cdots & \omega_{mI}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{M1}^{1} & \omega_{M2}^{1} & \cdots & \omega_{Mi}^{1} & \cdots & \omega_{MI}^{1}\end{bmatrix}},$ and ${W^{2}(n)} = {\begin{bmatrix}\omega_{11}^{2} & \omega_{12}^{2} & \cdots & \omega_{1j}^{2} & \cdots & \omega_{1J}^{2} \\\omega_{21}^{2} & \omega_{22}^{2} & \cdots & \omega_{2j}^{2} & \cdots & \omega_{2J}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{i1}^{2} & \omega_{i2}^{2} & \cdots & \omega_{ij}^{2} & \cdots & \omega_{iJ}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{I1}^{2} & \omega_{I2}^{2} & \cdots & \omega_{Ij}^{2} & \cdots & \omega_{IJ}^{2}\end{bmatrix}.}$

A threshold b¹(n) of the neuron of the hidden layer and a thresholdb²(n) of the neuron of the output layer are respectively as follows:

b ¹(n)=[b ₁ ¹ ,b ₂ ¹ , . . . ,b _(i) ¹ ]′,b ²(n)=[b ₁ ² ,b ₂ ² , . . .,b _(j) ²]′.

In step 5, a working signal of the neural network is forwardlypropagated.

The output of the input layer is equal to an input signal of thenetwork: v_(M) ^(m)(n)=u_(M) ^(m)(n).

The input of the i^(th) neuron of the hidden layer is equal to aweighted sum of the output of the input layer:

u _(I) ^(i)(n)=Σ_(m=1) ^(M)ω_(mi) ¹(n)v _(M) ^(m)(n)−b _(i) ¹(n).

The output of the i^(th) neuron of the hidden layer is equal to:

v _(I) ^(i)(n)=f(u _(I) ^(i)(n)).

f(⋅) is the transfer function of the hidden layer, which is generallythe Sigmod function.

The input of the j^(th) neuron of the output layer is equal to aweighted sum of the output of the hidden layer:

u _(J) ^(j)(n)=Σ_(i=1) ^(I)ω_(ij) ²(n)v _(I) ^(i)(n)−b _(j) ²(n).

The output of the j^(th) neuron of the output layer is equal to:

v _(J) ^(j)(n)=g(u _(J) ^(j)(n)).

g(⋅) is the transfer function of the output layer, which is generallythe linear function.

An error of the j^(th) neuron of the output layer is equal to:

e _(j)(n)=d _(j)(n)−v _(J) ^(j)(n).

A total error of the network is:

${E(n)} = {{\frac{1}{2}{\sum\limits_{j = 1}^{J}{e_{j}^{2}(n)}}} = {\frac{1}{2}{\sum\limits_{j = 1}^{J}{\left\{ {{d_{j}(n)} - {g\left\lbrack {{\sum\limits_{i = 1}^{I}{{\omega_{ij}^{2}(n)}{f\left( {{\sum\limits_{m = 1}^{M}{{\omega_{mi}^{1}(n)}{v_{M}^{m}(n)}}} - {b_{i}^{1}(n)}} \right)}}} - {b_{j}^{2}(n)}} \right\rbrack}} \right\}^{2}.}}}}$

In step 6, an error signal of the neural network is reverselypropagated.

1. In a weight and threshold adjustment stage, the weight and thethreshold are reversely adjusted layer by layer along the neuralnetwork, and the weight ω_(ij) ² and the threshold b_(j) ² between thehidden layer and the output layer are adjusted first.

A partial derivative of the total error to the weight ω_(ij) ² betweenthe hidden layer and the output layer is:

$\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{\omega_{ij}^{2}(n)}}} = {{{e_{j}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot {v_{I}^{i}(n)}} = {{- {e_{j}(n)}}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}{{v_{I}^{i}(n)}.}}}}$

A partial derivative of the total error to the threshold b_(j) ² betweenthe hidden layer and the output layer is:

$\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{b_{j}^{2}(n)}}} = {{{e_{j}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot \left( {- 1} \right)} = {{e_{j}(n)}{{g^{\prime}\left( {u_{I}^{i}(n)} \right)}.}}}}$

A local gradient is:

$\delta_{J}^{j} = {{- \frac{\partial{E(n)}}{\partial{u_{J}^{j}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{e_{j}(n)}}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}}} = {{e_{j}(n)}{{g^{\prime}\left( {u_{J}^{j}(n)} \right)}.}}}}$

2. The error signal is forwardly propagated, and the weight ω_(mi) ¹ andthe threshold b_(i) ¹ between the input layer and the hidden layer areadjusted.

A partial derivative of the total error to the weight ω_(mi) ¹ betweenthe input layer and the hidden layer is:

$\frac{\partial{E(n)}}{\partial\omega_{mi}^{1}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{u_{I}^{i}(n)}}{\partial\omega_{mi}^{1}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {d^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot {{v_{M}^{m}(n)}.}}}}}$

A partial derivative of the total error to the threshold b_(i) ¹ betweenthe input layer and the hidden layer is:

$\frac{\partial{E(n)}}{\partial{b_{i}^{1}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{u_{I}^{i}(n)}}{\partial{b_{i}^{1}(n)}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {d^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot {\left( {- 1} \right).}}}}}$

A local gradient is:

$\delta_{I}^{i} = {{- {\cdot \frac{\partial{E(n)}}{\partial{u_{I}^{i}(n)}}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)}} = {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{j} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {{f^{\prime}\left( {u_{I}^{i}(n)} \right)}.}}}}}}$

The local gradient of the neuron is equal to a product of the errorsignal of the neuron and the derivative of the transfer function.

So, the weight and the threshold are denoted with the local gradient as:

${\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{- \delta_{J}^{j}}{v_{I}^{i}(n)}}};$${\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = \delta_{J}^{j}};$${\frac{\partial{E(n)}}{\partial\omega_{mi}^{2}} = {{- \delta_{I}^{i}}{v_{M}^{m}(n)}}};$$\frac{\partial{E(n)}}{\partial{b_{i}^{2}(n)}} = {\delta_{I}^{i}.}$

In step 7, an error value matrix and a Jacobian matrix are calculated.

An error value matrix of Q samples is:

${e(n)} = {\begin{bmatrix}{e_{11}(n)} & {e_{12}(n)} & \cdots & {e_{1q}(n)} & \cdots & {e_{1Q}(n)} \\{e_{21}(n)} & {e_{22}(n)} & \cdots & {e_{2q}(n)} & \cdots & {e_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{j1}(n)} & {e_{j2}(n)} & \cdots & {e_{jq}(n)} & \cdots & {e_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{J1}(n)} & {e_{J2}(n)} & \cdots & {e_{Jq}(n)} & \cdots & {e_{JQ}(n)}\end{bmatrix}.}$

An element of the Jacobian matrix is:

${J_{jq}(n)} = {\begin{bmatrix}\frac{\partial{e_{jq}(n)}}{\partial{\omega_{11}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{12}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1J}^{2}(n)}} \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{21}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{22}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2J}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{i1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{i2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{ij}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{iJ}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{I1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{I2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{Ij}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{IJ}^{2}(n)}}\end{bmatrix}.}$

A structure of the Jacobian matrix is:

${J(n)} = {\begin{bmatrix}{J_{11}(n)} & {J_{12}(n)} & \cdots & {J_{1q}(n)} & \cdots & {J_{1Q}(n)} \\{J_{21}(n)} & {J_{22}(n)} & \cdots & {J_{2q}(n)} & \cdots & {J_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{j1}(n)} & {J_{j2}(n)} & \cdots & {J_{jq}(n)} & \cdots & {J_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{J1}(n)} & {J_{J2}(n)} & \cdots & {J_{Jq}(n)} & \cdots & {J_{JQ}(n)}\end{bmatrix}.}$

Similarly, the Jacobian matrix of the weights of the input layer and thehidden layer may be obtained.

When an error performance function has a form of square sum error, aHessian matrix may be approximately denoted as H=J^(T)J; and a gradientmay be demoted as g=J^(T)e, wherein J is a Jacobian matrix of a firstderivative of the error performance function to the weight of thenetwork.

In step 8, a weight and a threshold of the neural network are updated.

Adjustment amount Δω=learning rate η·local gradient δ·output signal ofprevious layer v.

Due to uncertain reversibility of J^(T)J, a unit matrix U is introducedto obtain H=J^(T)J+μU.

A weight and a threshold of the LM algorithm are corrected according tothe following formula:

ω(n+1)=w(n)−[J ^(T) J+μU] ⁻¹ J ^(T) e.

When μ=0, the LM algorithm is degenerated into a Newton Method.

A weight update formula is as follows:

ω_(ij) ²(n+1)=ω_(ij) ²(n)−[J ²(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)V _(I) ^(i)(n) ω_(mi) ¹(n+1)=ω_(mi) ¹(n)−[J ²(n)^(T) J ²(n)+μU]⁻¹ ηJ ¹(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I) ¹(n)v _(M)^(m)(n);

and

A threshold update formula is as follows:

b _(j) ²(n+1)=b _(j) ²(n)−[J ₂(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)v _(I) ^(i)(n) b _(i) ¹(n+1)=b _(i) ¹(n)−[J ₁(n)^(T) J ¹(n)^(T) J¹ +μU] ⁻¹ ηJ ₁(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I)¹(n)v _(M) ^(m)(n).

In step 9, error calculation and neural network testing are performed.

An error value is calculated, and whether a MSE error formula meets aprecision requirement is judged. When the MSE error formula does notmeet the precision requirement, the iteration is continued. When the MSEerror formula meets the precision requirement, the iteration is stopped.It is usually necessary to set one maximum number of iterations toprevent a program from entering a closed loop.

After finishing training of the neural network, the testing set istested.

After finishing training of N iterations, a group of optimum weightsω_(mi) ¹(N) and ω_(ij) ²(N), and a group of optimum thresholds b_(i)¹(N) and b_(j) ²(N) are obtained, and the normalized data u_(M) ^(m)(1)of the testing set is input. An output result Y(N) of the network isobtained by iterating once through the above calculation.

An actual predicted value y should be obtained by inversely normalizingthe output result of the network. The inverse normalization is realizedby a mapminmax function, and an algorithm principle is as follows.

1. Y=Y(N)*(x_(max)−x_(min))+x_(min), the [0, 1] interval is inverselynormalized.

$y = {{\frac{1}{2}\left( {{Y(N)} + 1} \right)*\left( {x_{\max} - x_{\min}} \right)} + {x_{\min}.}}$

the [−1, 1] interval is inversely normalized.

In the embodiment, data [x, o] is input into the BP neural networkmodel, and predicted values of the yield strength, the tensile strengthand the elongation after fracture are output, and compared with thecorresponding real values. Relative errors refer to Table 1.

TABLE 1 Prediction results of BP neural network Average value OutputData set Partial relative error value RE ARE Yield Training 0.0535810.042883 0.067611 0.053592 0.016761 0.08837 0.025762 0.11111 0.086146strength set Testing 0.13623 0.077594 0.021944 0.010062 0.12235 0.096620.13623 0.11435 0.089153 set Tensile Training 0.049592 0.026637 0.0507260.10438 0.022479 0.013203 0.063114 0.056277 0.04805 strength set Testing0.080171 0.027546 0.05005 0.069319 0.0064048 0.016949 0.032492 0.0741330.04969 set Elongation Training 0.029178 0.13184 0.19602 0.0574570.039334 0.022855 0.074339 0.056146 0.10564 after set fracture Testing0.083664 0.010158 0.12724 0.065606 0.049305 0.016366 0.00012153 0.147250.116 set

A true value, a simulated value and an absolute error of each group ofdata are shown in FIGS. 3A-3F.

Regression curves of the training set, a verification set and thetesting set are shown in FIGS. 4A-4D.

It can be seen from Table 1 that the error of each testing set isslightly larger than that of the training set, so that better modeltraining is realized. It can be seen from FIGS. 3A-3F that the simulatedvalue and the true value of the data set are approximate, so that theprediction result is more accurate. It can be seen from FIGS. 4A-4D thata correlation coefficient of the training set is less than 0.9827, acorrelation coefficient of the verification set is 0.98084, and acorrelation coefficient of the testing set is 0.96999, which are allapproximate to 1, so that a good regression capability is provided.

Different numbers of neurons of the hidden layer are selected to trainand test the BP network, and 10 experiments are performed according to a10-fold cross-validation method to obtain an average relative errorchange as shown in FIG. 5 .

It can be seen from FIG. 5 that with the increase of the number ofneurons of the hidden layer, the relative error value is reduced atfirst and then increased. When the number of neurons is 15 or 17, therelative error is lower and the prediction capability is better.

Different training functions of the neural network are selected to trainand test the BP network, and 10 experiments are performed according tothe 10-fold cross-validation method to obtain an average relative errorchange as shown in Table 2.

FIG. 2 Relative errors of different training functions in the case ofoptimum number of neurons of hidden layer

Optimum Average relative error ARE number of Elongation after neuronsYield strength Tensile strength fracture of hidden Number of TrainingTesting Training Testing Training Testing Algorithm layer iterations setset set set set set Total trainlm 17 20-60  0.0883 0.1304 0.0476 0.06790.1126 0.1437 0.5905 trainbfg 20 45-120 0.0952 0.1216 0.0512 0.06530.1265 0.1599 0.6197 traingdx 18 174-522  0.1055 0.1331 0.0547 0.06770.141 0.1661 0.6681 traingdm 15 >10000 0.1255 0.1509 0.0632 0.07440.1715 0.1964 0.7819 trainscg 16 34-143 0.0981 0.1225 0.0524 0.0640.1291 0.1553 0.6214 trainrp 19 45-120 0.1041 0.1313 0.0555 0.0676 0.1350.1657 0.6592

It can be seen from Table 2 that, compared with other trainingfunctions, the LM algorithm not only has a faster operation speed, butalso can achieve and provide an optimum training effect. Compared with aBFG algorithm, an overall average relative error can be increased by4.7%.

A comprehensive quantitative evaluation method is employed to evaluate aquality of heat treatment, and relevant definitions are as follows:

a relative performance index is: RI_(i)=C_(i′) /C_(i) , an equivalentperformance index is: EI_(i)=C_(i)′/C_(i) −1, and a comprehensiveperformance index is: IV=EI·W. C_(i)′ represents an actually measuredaverage value of a certain mechanical performance index, C_(i)represents an expected value or a median value of the performance index,and W represents a corresponding weight coefficient, which is generallybased on a failure rate of a workpiece when the mechanical performanceindex is not reached in an actual use process.

Mechanical performance requirements of annealed 316L stainless steelspecified in national standards refer to Table 3.

TABLE 3 Mechanical performances of annealed 316L stainless steelMechanical performance≥ Tensile strength/ Elongation/ Material StateR_(p0.2)/MPa MPa % 316L Annealing 175 480 40

In order to make the annealed 316L have good strength and plasticity,the weight coefficient may be taken as the tensile strength:elongation=1: 1 to quantitatively evaluate the quality of heat treatmentwhen the yield strength meets the requirements. An annealing process ofthe 316L stainless steel with a thickness of 0.02 mm to 0.05 mm in acertain factory is as follows: an annealing temperature is 950° C., arunning speed of steel strip is 10 m/min to 15 m/min, a length ofannealing furnace is 10.8 m, and the annealing lasts for 0.72 minute to1.08 minutes.

Quantitative evaluation results of heat treatment of an annealingprocesses used in factory and a predicted annealing process of partialBP neural network refer to Table 4.

TABLE 4 Evaluation results of heat treatment of annealing processes HeatComprehensive Annealing preservation Elongation after performanceSampling temperature time Tensile strength fracture value direction ° C.min MPa RI EI % RI EI IV Experimental R 950 1 645.35 1.344 0.344 52.841.321 0.321 0.665 value T 950 1 652.86 1.36 0.36 54.796 1.37 0.37 0.73Predicated R 940 3 508.48 1.059 0.059 64.867 1.622 0.622 0.681 value T940 3 593.71 1.237 0.237 74.457 1.861 0.861 1.098 R 960 2.5 524.9 1.0940.094 64.476 1.611 0.612 0.706 T 960 2.5 573.77 1.195 0.195 77.808 1.9450.945 1.14

It can be seen from Table 4 that, when the quality of heat treatment isevaluated according to the weight coefficient that tensile strength:elongation=1: 1, the 316L stainless steel has better comprehensivemechanical performances in the case that the annealing temperature is940° C. and the heat preservation time is 3 minutes, or the annealingtemperature is 960° C. and the heat preservation time is 2.5 minutes.Compared with the annealing process used in the factory—the annealingtemperature is 950° C. and the heat preservation time is 1 minute, acomprehensive performance value in a T direction can be increased by56.16% when the annealing temperature is 960° C. and the heatpreservation time is 2.5 minutes.

What is not described in detail in the specification of the presentinvention belongs to the prior art known to those skilled in the art.The illustrative specific embodiments of the present invention aredescribed above for the convenience of understanding the presentinvention by those skilled in the art, but it should be clear that thepresent invention is not limited to the scope of the specificembodiments. For those of ordinary skills in the art, as long as variouschanges are within the spirit and scope of the present invention definedand determined by the appended claims, these changes are obvious, andall inventions using the inventive concept are protected.

We claim:
 1. A neural network model optimization method based on anannealing process for a stainless steel ultra-thin strip, wherein anerror back propagation algorithm is employed to train a single hiddenlayer neural network, comprising: step 1: designing a network model, anddetermining a number of layers of the network, and a number of nodes ofan input layer, a number of nodes of an output layer and a number ofnodes of a hidden layer; step 2: selecting a transfer function, atraining method and training parameters; step 3: selecting sample dataaccording to the step 2, dividing the sample data into a training setand a testing set, and performing data preprocessing; step 4: settingand initializing parameters of the neural network; step 5: adjustingforward propagation of a working signal of the neural network; step 6:adjusting back propagation of an error signal of the neural network;step 7: calculating an error value matrix and a Jacobian matrix; step 8:updating a weight and a threshold of the neural network; and step 9:performing error calculation and neural network testing.
 2. The neuralnetwork model optimization method based on the annealing process for thestainless steel ultra-thin strip according to claim 1, wherein amulti-layer network with one hidden layer is used.
 3. The neural networkmodel optimization method based on the annealing process for thestainless steel ultra-thin strip according to claim 2, wherein theselecting the sample data according to the step 2, dividing the sampledata into the training set and the testing set, and performing the datapreprocessing in the step 3, further comprises the following steps of:step 3.1: dividing the sample data into the training set and the testingset; and step 3.2: normalizing the samples in the training set and thetesting set.
 4. The neural network model optimization method based onthe annealing process for the stainless steel ultra-thin strip accordingto claim 3, wherein a specific method for the normalizing the samples ofthe training set and the testing set in the step 3.2, comprises: mappingdata to [0, 1] or [−1, 1] by using a mapminmax function, and recordingan input in a data set as x and an output in the data set as o; that is:normalizing the samples to [0, 1] by a formulau_(M)(1)=(x−x_(min))/(x_(max)−x_(min)); and normalizing the samples to[−1, 1] by a formula u_(M)(1)=2*(x−x_(min))/(x_(max)−x_(min))−1, whereinu_(M)(1) represents an initial input value of the network; andsimilarly, normalizing the output o to obtain an expected output d(n) ofthe network, wherein x_(max) represents a maximum input value, andx_(min) represents a minimum input value.
 5. The neural network modeloptimization method based on the annealing process for the stainlesssteel ultra-thin strip according to claim 4, wherein a specific methodfor the setting and initializing the parameters of the neural network inthe step 4, comprises: employing a three-layer neural network, setting atransfer function of the hidden layer as a Sigmod function, and settinga transfer function of the output layer as a linear function; andrepresenting an input and an output of each layer with u and v, wherein:an input of the input layer is u_(M) ^(m)(n) and an output of the inputlayer is v_(M) ^(m)(n); an input of the hidden layer is u_(I) ^(i)(n)and an output of the hidden layer is v_(I) ^(i)(n); an input of theoutput layer is i_(J) ^(j)(n) and an output of the output layer is v_(J)^(j)(a); a number of neurons of the input layer is M and an m^(th)neuron of the input layer is recorded as x_(m); a number of neurons ofthe hidden layer is I and an i^(th) neuron of the hidden layer isrecorded as k_(i); a number of neurons of the output layer is J and aj^(th) neuron of the output layer is recorded as y_(j); a connectionweight from x_(m) to k_(i) is ω_(mi) ¹ and a connection threshold isb_(i) ¹; a connection weight from k_(i) to y_(j) is ω_(ij) ² and aconnection threshold is b_(j) ²; an input signal of the network isdenoted as u_(M)(n)=[u_(M) ¹,u_(M) ², . . . ,u_(M) ^(M)]′; an actualoutput of the network is denoted as Y(n)=[v_(J) ¹,v_(J) ², . . . ,v_(J)^(J)]; an expected output of the network is denoted as d(n)=[d₁,d₂, . .. ,d_(J)]; wherein n represents a number of iterations, and d representsan output value of the sample data; an error of the j^(th) neuron of theoutput layer in an n^(th) iteration is denoted ase_(j)(n)=d_(j)(n)−Y_(j)(n), and a total error is denoted as${{E(n)} = {\frac{1}{2}{\sum_{j = 1}^{J}{e_{j}^{2}(n)}}}},$  wherein erepresents the error; a weight matrix W¹ between the neuron of the inputlayer and the neuron of the hidden layer is:${{W^{1}(n)} = \begin{bmatrix}\omega_{11}^{1} & \omega_{12}^{1} & \cdots & \omega_{1i}^{1} & \cdots & \omega_{1I}^{1} \\\omega_{21}^{1} & \omega_{22}^{1} & \cdots & \omega_{2i}^{1} & \cdots & \omega_{2I}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{m1}^{1} & \omega_{m2}^{1} & \cdots & \omega_{mi}^{1} & \cdots & \omega_{mI}^{1} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{M1}^{1} & \omega_{M2}^{1} & \cdots & \omega_{Mi}^{1} & \cdots & \omega_{MI}^{1}\end{bmatrix}};$ a weight matrix W² between the neuron of the hiddenlayer and the neuron of the output layer is:${{W^{2}(n)} = \begin{bmatrix}\omega_{11}^{2} & \omega_{12}^{2} & \cdots & \omega_{1j}^{2} & \cdots & \omega_{1J}^{2} \\\omega_{21}^{2} & \omega_{22}^{2} & \cdots & \omega_{2j}^{2} & \cdots & \omega_{2J}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{i1}^{2} & \omega_{i2}^{2} & \cdots & \omega_{ij}^{2} & \cdots & \omega_{iJ}^{2} \\ \vdots & \vdots & & \vdots & & \vdots \\\omega_{I1}^{2} & \omega_{I2}^{2} & \cdots & \omega_{Ij}^{2} & \cdots & \omega_{IJ}^{2}\end{bmatrix}};$ a threshold b¹(n) of the neuron of the hidden layer is:b¹(n)=[b₁ ¹,b₂ ¹, . . . , b_(i) ¹]′; a threshold b²(n) of the neuron ofthe output layer is: b²(n)=[b₁ ²,b₂ ², . . . , b_(j) ²]′.
 6. The neuralnetwork model optimization method based on the annealing process for thestainless steel ultra-thin strip according to claim 5, wherein aspecific method for the forward propagation of the working signal of theneural network in the step 5, comprises: setting the output of the inputlayer to be equal to an input signal of the network: v_(M) ^(m)(n)=u_(M)^(m)(n); setting the input of the i^(th) neuron of the hidden layer tobe equal to a weighted sum of the output of the input layer: u_(I)^(i)(n)=Σ_(m=1) ^(M)ω_(mi) ¹(n)v_(M) ^(m)(n)−b_(i) ¹(n); and setting theoutput of the i^(th) neuron of the hidden layer to be equal to thetransfer function of the hidden layer: v_(I) ^(i)(n)=f(u_(I) ^(i)(n)),wherein f(⋅) is the transfer function of the hidden layer; setting theinput of the j^(th) neuron of the output layer to be equal to a weightedsum of the output of the hidden layer: u_(J) ^(j)(n)=Σ_(i=1) ^(I)ω_(ij)²(n)v_(I) ^(i)(n)−b_(j) ²(n); and setting the output of the j^(th)neuron of the output layer to be equal to the transfer function of theoutput layer: v_(J) ^(j)(n)=g(u_(J) ^(j)(n)), wherein g(⋅) is thetransfer function of the output layer; so, an error of the j^(th) neuronof the output layer is equal to: e_(j)(n)=d_(j)(n)−v_(J) ^(j)(n); and atotal error of the network is denoted as:${E(n)} = {{\frac{1}{2}{\sum\limits_{j = 1}^{J}{e_{j}^{2}(n)}}} = {\frac{1}{2}{\sum\limits_{j = 1}^{J}{\left\{ {{d_{j}(n)} - {g\left\lbrack {{\sum\limits_{i = 1}^{I}{{\omega_{ij}^{2}(n)}{f\left( {{\sum\limits_{m = 1}^{M}{{\omega_{mi}^{1}(n)}{v_{M}^{m}(n)}}} - {b_{i}^{2}(n)}} \right)}}} - {b_{j}^{2}(n)}} \right\rbrack}} \right\}^{2}.}}}}$7. The neural network model optimization method based on the annealingprocess for the stainless steel ultra-thin strip according to claim 6,wherein a specific method for the back propagation of the error signalof the neural network in the step 6, comprises: step 6.1: in a weightand threshold adjustment stage, reversely adjusting layer by layer alongthe neural network, and adjusting the weight ω_(ij) ² and the thresholdb_(j) ² between the hidden layer and the output layer first; a partialderivative of the total error to the weight ω_(ij) ² between the hiddenlayer and the output layer being:${\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{\omega_{ij}^{2}(n)}}} = {{{e_{j}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot {v_{I}^{i}(n)}} = {{- {e_{j}(n)}}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}{v_{I}^{i}(n)}}}}},$a partial derivative of the total error to the threshold b_(j) ² betweenthe hidden layer and the output layer being:${\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{b_{J}^{2}(n)}}} = {{{e_{ju}(n)} \cdot \left( {- 1} \right) \cdot {g^{\prime}\left( {u_{J}^{j}(n)} \right)} \cdot \left( {- 1} \right)} = {{e_{j}(n)}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}}}}},$a local gradient being:${\delta_{J}^{j} = {{- \frac{\partial{E(n)}}{\partial{u_{J}^{j}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{e_{j}(n)}}} \cdot \frac{\partial{e_{j}(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}}} = {{e_{j}(n)}{g^{\prime}\left( {u_{J}^{j}(n)} \right)}}}}},$wherein g′(⋅) represents a derivative of the transfer function g(⋅) ofthe output layer; and step 6.2: forwardly propagating the error signal,and adjusting the weight ω_(mi) ¹ and the threshold b_(i) ¹ between theinput layer and the hidden layer; a partial derivative of the totalerror to the weight ω_(mi) ¹ between the input layer and the hiddenlayer being:${\frac{\partial{E(n)}}{\partial\omega_{mi}^{1}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{E(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{u_{I}^{i}(n)}}{\partial\omega_{mi}^{1}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{i} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot {v_{M}^{m}(n)}}}}}},$a partial derivative of the total error to the threshold b_(i) ¹ betweenthe input layer and the hidden layer being:${\frac{\partial{E(n)}}{\partial{b_{i}^{2}(n)}} = {{\frac{\partial{E(n)}}{\partial{e_{j}(n)}} \cdot \frac{\partial{E(n)}}{\partial{v_{J}^{j}(n)}} \cdot \frac{\partial{v_{J}^{j}(n)}}{\partial{u_{J}^{j}(n)}} \cdot \frac{\partial{u_{J}^{j}(n)}}{\partial{v_{I}^{i}(n)}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}} \cdot \frac{\partial{u_{I}^{i}(n)}}{\partial{b_{i}^{2}(n)}}} = {- {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{i} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)} \cdot \left( {- 1} \right)}}}}},$a local gradient being:${\delta_{I}^{i} = {{- \frac{\partial{E(n)}}{\partial{u_{I}^{i}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot \frac{\partial{v_{I}^{i}(n)}}{\partial{u_{I}^{i}(n)}}} = {{{- \frac{\partial{E(n)}}{\partial{v_{I}^{i}(n)}}} \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)}} = {\sum\limits_{j = 1}^{J}{\left( {\delta_{J}^{i} \cdot {\omega_{ij}^{2}(n)}} \right) \cdot {f^{\prime}\left( {u_{I}^{i}(n)} \right)}}}}}}},$wherein f′(⋅) represents a derivative of the transfer function f(⋅) ofthe input layer; and the local gradient of the neuron is equal to aproduct of the error signal of the neuron and the derivative of thetransfer function; so, the weight and the threshold are denoted with thelocal gradient as:${\frac{\partial{E(n)}}{\partial{\omega_{ij}^{2}(n)}} = {{- \delta_{J}^{j}}{v_{I}^{i}(n)}}};$${\frac{\partial{E(n)}}{\partial{b_{j}^{2}(n)}} = \delta_{J}^{j}};$${\frac{\partial{E(n)}}{\partial\omega_{mi}^{2}} = {{- \delta_{I}^{i}}{v_{M}^{m}(n)}}};$$\frac{\partial{E(n)}}{\partial{b_{i}^{2}(n)}} = {\delta_{I}^{i}.}$ 8.The neural network model optimization method based on the annealingprocess for the stainless steel ultra-thin strip according to claim 7,wherein a specific method for the calculating the error value matrix andthe Jacobian matrix in the step 7, comprises: denoting an error valuematrix of Q samples as: ${{e(n)} = \begin{bmatrix}{e_{11}(n)} & {e_{12}(n)} & \cdots & {e_{1q}(n)} & \cdots & {e_{1Q}(n)} \\{e_{21}(n)} & {e_{22}(n)} & \cdots & {e_{2q}(n)} & \cdots & {e_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{j1}(n)} & {e_{j2}(n)} & \cdots & {e_{jq}(n)} & \cdots & {e_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{e_{j1}(n)} & {e_{j2}(n)} & \cdots & {e_{jq}(n)} & \cdots & {e_{jQ}(n)}\end{bmatrix}};$ denoting an element of the Jacobian matrix as:${{J_{jq}(n)} = \begin{bmatrix}\frac{\partial{e_{jq}(n)}}{\partial{\omega_{11}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{12}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{1J}^{2}(n)}} \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{21}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{22}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2j}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{2J}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{i1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{i2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{ij}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{iJ}^{2}(n)}} \\ \vdots & \vdots & & \vdots & & \vdots \\\frac{\partial{e_{jq}(n)}}{\partial{\omega_{I1}^{2}(n)}} & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{I2}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{IJ}^{2}(n)}} & \cdots & \frac{\partial{e_{jq}(n)}}{\partial{\omega_{IJ}^{2}(n)}}\end{bmatrix}};$ denoting a structure of the Jacobian matrix as:${{J(n)} = \begin{bmatrix}{J_{11}(n)} & {J_{12}(n)} & \cdots & {J_{1q}(n)} & \cdots & {J_{1Q}(n)} \\{J_{21}(n)} & {J_{22}(n)} & \cdots & {J_{2q}(n)} & \cdots & {J_{2Q}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{j1}(n)} & {J_{j2}(n)} & \cdots & {J_{jq}(n)} & \cdots & {J_{jQ}(n)} \\ \vdots & \vdots & & \vdots & & \vdots \\{J_{j1}(n)} & {J_{j2}(n)} & \cdots & {J_{jq}(n)} & \cdots & {J_{jQ}(n)}\end{bmatrix}};$ similarly, obtaining the Jacobian matrix of the weightsof the input layer and the hidden layer; H being a Hessian matrix of anerror performance function, which contains second derivative informationof the error function; when the error performance function has a form ofsquare sum error, the Hessian matrix being approximately denoted asH=J^(T)J; and a gradient being demoted as g==J^(T)e, wherein J is aJacobian matrix of a first derivative of the error performance functionto the weight of the network.
 9. The neural network model optimizationmethod based on the annealing process for the stainless steel ultra-thinstrip according to claim 8, wherein a specific method for the updatingthe weight and the threshold of the neural network in the step 8,comprises: adjustment amount Δω=learning rate η·local gradient δ·outputsignal of previous layer v; due to uncertain reversibility of J^(T)J, aunit matrix U is introduced to obtain H=J^(T)J+μU, wherein μ is adamping factor; according to a formula ω(n+1)=ω(n)−[J^(T)J+μU]⁻¹J^(T)e,a weight and a threshold of a LM algorithm are corrected; and when μ=0,the LM algorithm is degenerated into a Newton Method; a weight updateformula is denoted as:ω_(ij) ²(n+1)=ω_(ij) ²(n)−[J ²(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)V _(I) ^(i)(n) ω_(mi) ¹(n+1)=ω_(mi) ¹(n)−[J ²(n)^(T) J ²(n)+μU]⁻¹ ηJ ¹(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I) ¹(n)v _(M)^(m)(n); and a threshold update formula is denoted as:b _(j) ²(n+1)=b _(j) ²(n)−[J ₂(n)^(T) J ²(n)+μU] ⁻¹ ηJ ²(n)^(T) e_(j)(n)v _(I) ^(i)(n) b _(i) ¹(n+1)=b _(i) ¹(n)−[J ₁(n)^(T) J ¹(n)^(T) J¹ +μU] ⁻¹ ηJ ₁(n)^(T)Σ_(j=1) ^(J)(J ²(n)e _(j)(n) ω_(ij) ²(n))v _(I)¹(n)v _(M) ^(m)(n).
 10. The neural network model optimization methodbased on the annealing process for the stainless steel ultra-thin stripaccording to claim 9, wherein a specific method for the performing theerror calculation and the neural network testing in the step 9,comprises: calculating an error value, judging whether a MSE errorformula meets a precision requirement, when the MSE error formula meetsthe precision requirement, stopping the iteration; when the MSE errorformula does not meet the precision requirement, continuing theiteration; after finishing training of the neural network, testing thetesting set; and obtaining an actual predicted value by inverselynormalizing an output result of the network.